BIO721P Genome-Bioinformatics

Genome assembly

(and quality assessment)

bmpvieira.com/assembly15

bmpvieira

Bruno Vieira | @bmpvieira

Phd Student @ QMUL

Bioinformatics and Population Genomics


Supervisor:
Yannick Wurm | @yannick__




© 2015 Bruno Vieira CC-BY 4.0


Useful books

Papers




Chen 2011


Part I - Manual genome Assembly


Part II - Reads quality assessment and cleaning


FastQC

FastQC Documentation



Diginorm

"(...)systematizes coverage in shotgun sequencing data sets, thereby decreasing sampling variation, discarding redundant data, and removing the majority of errors."


Diginorm

"(...)reduces the size of shotgun data sets and decreases the memory and time requirements for de novo sequence assembly, all without significantly impacting content of the generated contigs."

Magic? No, Bloom filters


Diginorm

What is digital normalization, anyway?

Why you shouldn't use digital normalization


Fasta


Fastq


Interleaved format


Practical

Part II


Part III - Assembling reads


Types

Algoritms

Strategies




Assembly paradigms


Overlap/Layout/Consensus


Overlap/Layout/Consensus



Chen 2011


de Brujin


de Brujin


Chen 2011




Schatz 2012




Schatz 2012


Too many assemblers

seqanswers.com/wiki/De-novo_assembly



A5, ABySS, ALLPATHS, CABOG, CLCbio, Contrail, Curtain, DecGPU, Forge, Geneious, GenoMiner, IDBA, Lasergene, MIRA, Newbler, PE-Assembler, QSRA, Ray, SeqMan NGen, SeqPrep, Sequencher, SHARCGS, SHORTY, SHRAP, SOAPdenovo, SR-ASM, SuccinctAssembly, SUTTA, Taipan, VCAKE, Velvet


Benchmarking



Why we need the assemblathon


Assembly quality assessment


Assembly quality assessment



Assembly quality assessment


N50 must die?


Assembly quality assessment

source


Assembly quality assessment

source



Practical

Part III


Part IV - Try manual assembly again? (optional/homework)


Example footer: Copyright 2016 Authors.